Skip to content

Add multi-instance SDK support via delegating providers#177

Open
JacksonWeber wants to merge 6 commits into
microsoft:mainfrom
JacksonWeber:feature/multi-instance-sdk
Open

Add multi-instance SDK support via delegating providers#177
JacksonWeber wants to merge 6 commits into
microsoft:mainfrom
JacksonWeber:feature/multi-instance-sdk

Conversation

@JacksonWeber

@JacksonWeber JacksonWeber commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds opt-in support for running multiple isolated Microsoft OpenTelemetry SDK instances in a single Node.js runtime. Today useMicrosoftOpenTelemetry() registers global providers per signal, so a second initialization clobbers the first. This PR introduces a parent/child delegating-provider architecture so independent pipelines can coexist — and, crucially, each instance owns its own set of OpenTelemetry instrumentations, settings, and sampling, bound directly to that instance's providers.

The prioritized scenario this enables: a customer registers different instrumentations for the Azure Monitor exporter than for the A365 (or OTLP) exporter in the same process (e.g. full HTTP/SQL/Redis to Azure Monitor, GenAI-only to A365).

GA constraint respected: all changes are additive and non-breaking. The existing single-instance useMicrosoftOpenTelemetry() path is untouched.

What's new

  • createMicrosoftOpenTelemetryInstance(options, { makeDefault? }) returns a MicrosoftOpenTelemetryInstance handle (getTracer/getMeter/getLogger, runWithInstance, forceFlush, shutdown). Each instance builds a standalone pipeline plus its own instrumentations, sampler, and exporter.
  • runWithMicrosoftOpenTelemetryInstance(id, fn) binds an ambient current instance so code using the global OTel API routes to the right pipeline.

Design

src/distro/multiInstance/:

  • instanceRegistry.ts — registry + default instance + AsyncLocalStorage-backed ambient current instance.
  • delegatingProviders.ts — Parent Tracer/Meter/Logger providers that resolve the current child per call (never cached) and delegate, with Noop fallbacks.
  • globalSetup.ts — registers the parent providers + a single shared context manager + composite propagator exactly once, and installs the SDKStats instrumentation patcher once.
  • instance.ts — builds each child as standalone NodeTracerProvider/MeterProvider/LoggerProvider (no NodeSDK.start(), no global registration), wiring Azure Monitor handlers, A365 export, caller processors, and a console fallback. It then builds this instance's instrumentations from its own instrumentationOptions and binds them to the child providers via registerInstrumentations, before registering the instance.

Per-instance instrumentations (the key mechanism). Because each instrumentation is bound directly to its instance's providers, the spans/metrics/logs it produces flow only to that instance's exporter. Two instances with different instrumentationOptions therefore feed their respective exporters with different instrumentation sets. Auto-instrumentation does not depend on the ambient current instance and bypasses the delegating providers; those parents remain the routing mechanism only for manual telemetry created through the global OpenTelemetry API (trace.getTracer(...)), which still resolves to the ambient (via runWithInstance) or default instance.

A365 export & defaults. Each instance can target A365 (A365SpanProcessor + Agent365Exporter). When an instance targets A365 without Azure Monitor, the A365 GenAI-focused instrumentation defaults are applied per instance. _applyA365InstrumentationDefaults was moved into src/distro/instrumentations.ts and is now shared by both the single- and multi-instance paths (re-exported from distro.ts for existing consumers).

Third-party instrumentations & touching globals

Registering a third-party instrumentation. The per-instance instrumentationOptions covers the built-in set (HTTP, Azure SDK, DB clients, loggers, GenAI). To add an arbitrary third-party OpenTelemetry instrumentation, a customer registers it the standard OTel way and does not pass a provider:

import { registerInstrumentations } from "@opentelemetry/instrumentation";
registerInstrumentations({ instrumentations: [new SomeThirdPartyInstrumentation()] });

When no tracerProvider/meterProvider/loggerProvider is supplied, OTel binds the instrumentation to the global providers — which, after ensureGlobalSetup(), are our delegating parent providers. So the third-party instrumentation transparently receives delegating tracers/meters/loggers.

How its telemetry is routed. The delegating provider resolves the target instance per call (never cached): the ambient instance bound by runWithInstance, else the default instance. So:

  • Telemetry the instrumentation emits while the customer's workload runs inside instance.runWithInstance(() => …) lands on that instance's exporter.
  • Outside any runWithInstance, it lands on the default instance.
  • Before any instance exists, the Noop fallbacks make the calls safe (they never throw), rather than crashing instrumentation that loads early.

Why "touching globals" is safe. A third-party instrumentation (or an SDK it bundles) may reach for process-global OTel state. Each case is handled:

  • Re-registering global providers (trace.setGlobalTracerProvider, metrics.setGlobalMeterProvider, logs.setGlobalLoggerProvider): the OTel API honors only the first registration and turns later calls into a no-op (with a diag warning). ensureGlobalSetup() installs the delegating parents once, up front, and the child providers are deliberately never set as global. A third-party instrumentation therefore cannot overwrite the delegators, hijack routing, or pin all telemetry to a single pipeline.
  • Reading the global provider (trace.getTracerProvider() / getTracer): returns the delegating parent, so it inherits the per-call routing above.
  • Context manager & propagators (context.setGlobalContextManager, propagation.setGlobalPropagator): also once-only, and installed once by ensureGlobalSetup() as a single shared AsyncLocalStorageContextManager + CompositePropagator. Context and propagation are intentionally process-wide and shared by every instance (the ambient current instance id itself rides on the active context), so a third-party instrumentation using context.active()/.with()/propagation interoperates correctly and can't install a competing manager.
  • Module monkey-patching (patching http, a DB driver, etc.): this is inherently process-global and independent of providers. Because the instrumentation holds a delegating tracer, the patched code still emits into whichever instance is ambient at call time. If two instances enable the same built-in instrumentation, each gets its own instrumentation object bound to its own providers; the underlying module may be wrapped by both (standard OTel behavior), but each wrapper carries its own instance-bound tracer, so spans still land on the correct instance.
  • SDKStats bookkeeping: the autoloader enableInstrumentations patch is installed once in ensureGlobalSetup(), so third-party registrations are still counted.

Deterministic binding (note). Built-in per-instance instrumentations are bound directly to their instance's providers, so they need no ambient context. A globally-registered third-party instrumentation instead routes via the ambient/default instance; to pin it deterministically to one instance, run the workload it observes inside that instance's runWithInstance(...). A future enhancement could accept instrumentations?: Instrumentation[] in the per-instance options and bind them directly, matching the built-in behavior.

Testing

  • test/internal/functional/multiInstance.test.ts:
    • Two instances with distinct connection strings — each instance's spans reach only its own exporter, and global-API spans inside runWithInstance route to the bound instance.
    • New: two instances with different instrumentationOptions (HTTP on for A, off for B) bind different instrumentation sets to distinct providers — verifying per-exporter instrumentation.
  • Verified end-to-end with a standalone Node app: two instances in one process — A (Azure Monitor, HTTP+Azure SDK on, 100% sampling) and B (console, HTTP off, 25% sampling). The outgoing HTTPS request produced an HTTP dependency span on A only; sampling kept 40/40 on A and ~10/40 on B; Azure Monitor transmitted with no errors.
  • Full unit + functional suites green (910 passed); ESM + CJS build, lint, and format pass.

Dependencies

  • Promotes @opentelemetry/context-async-hooks from transitive to a direct dependency (imported directly to register the context manager, since the multi-instance path does not call NodeSDK.start()). No overrides used.

Introduce createMicrosoftOpenTelemetryInstance to run multiple isolated SDK
instances in one Node.js runtime. Parent (delegating) Tracer/Meter/Logger
providers route per-call to the current child instance, resolved via an
AsyncLocalStorage-backed ambient context (runWithInstance) with a default
fallback. Each instance owns its own resource, sampler, processors, readers,
and exporters. Additive and opt-in; the existing useMicrosoftOpenTelemetry
single-instance path is unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 17, 2026 21:38

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in multi-instance Microsoft OpenTelemetry SDK mode that allows multiple isolated telemetry pipelines to coexist in a single Node.js process by registering global delegating providers that route per-call to an “ambient” (AsyncLocalStorage-bound) instance, falling back to a default instance.

Changes:

  • Introduces createMicrosoftOpenTelemetryInstance() / runWithMicrosoftOpenTelemetryInstance() and the MicrosoftOpenTelemetryInstance handle type.
  • Implements multi-instance runtime: instance registry + id binding, one-time global setup (context manager/propagator + parent providers), and delegating tracer/meter/logger providers.
  • Adds functional coverage for isolation and ambient routing; promotes @opentelemetry/context-async-hooks to a direct dependency.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
test/internal/functional/multiInstance.test.ts New functional tests validating per-instance isolation and ambient routing via global API
src/types.ts Adds MicrosoftOpenTelemetryInstance public interface type
src/index.ts Re-exports new multi-instance APIs/types from the package entrypoint
src/distro/multiInstance/instanceRegistry.ts Registry + AsyncLocalStorage-backed “current instance” binding and resolution
src/distro/multiInstance/instance.ts Builds per-instance child providers/pipelines and lifecycle methods (flush/shutdown)
src/distro/multiInstance/index.ts Exposes multi-instance APIs from the distro layer
src/distro/multiInstance/globalSetup.ts One-time process-global setup for context manager/propagator + parent providers
src/distro/multiInstance/delegatingProviders.ts Parent providers + delegating tracer/meter/logger implementations
src/distro/index.ts Surfaces multi-instance exports through src/distro
package.json Adds @opentelemetry/context-async-hooks direct dependency
package-lock.json Locks the new dependency for npm ci installs

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/distro/multiInstance/instanceRegistry.ts
Comment thread src/distro/multiInstance/instance.ts Outdated
Comment thread test/internal/functional/multiInstance.test.ts
Comment thread test/internal/functional/multiInstance.test.ts
- withInstance: skip binding unknown/stale ids so resolution falls back to the
  default instance instead of producing silent no-op telemetry.
- instance.shutdown: wrap disposers via Promise.resolve().then so a synchronous
  throw is captured and does not abort the rest of shutdown.
- multiInstance test: also disable the logs provider and the global context
  manager in afterEach to prevent cross-test contamination.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@rads-1996 rads-1996 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to break the PR into small chunks, so it is easier to review? Or does it make logical sense to have everything together?

@JacksonWeber

JacksonWeber commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Is it possible to break the PR into small chunks, so it is easier to review? Or does it make logical sense to have everything together?

I don't believe that will work well for this PR as it implements a single contained feature. If a single component of it is split off it won't function. Only way I can think to split it is per signal, but I'm not sure that'd make it much easier to review.

@JacksonWeber JacksonWeber marked this pull request as draft June 25, 2026 20:40
JacksonWeber and others added 2 commits June 30, 2026 19:41
Each createMicrosoftOpenTelemetryInstance() now builds its own OpenTelemetry
instrumentation set from its instrumentationOptions and binds it directly to
that instance's providers via registerInstrumentations, so different exporters
(e.g. Azure Monitor vs. A365) can run different instrumentations and settings
in the same process. Adds A365 export support and per-instance A365 GenAI
instrumentation defaults, tracks per-instance registration in SDKStats, and
shares _applyA365InstrumentationDefaults between the single- and multi-instance
paths.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Resolve dep version conflicts (adopt main's 0.219.0/2.8.0, keep context-async-hooks) and adapt to api-logs createNoopLogger.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@JacksonWeber JacksonWeber marked this pull request as ready for review July 2, 2026 20:47
@JacksonWeber JacksonWeber requested a review from Copilot July 2, 2026 20:47

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated 1 comment.

Comment thread src/distro/multiInstance/delegatingProviders.ts Outdated
…instance

DelegatingMeter previously resolved the instance at instrument creation time, pinning .add()/.record() to whichever instance was current then (usually the default). Synchronous instruments now re-resolve the current instance on every measurement so metrics follow runWithInstance like traces/logs. Observable instruments remain bound at creation (collected async, outside any scope).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 2 comments.

Comment thread test/internal/unit/multiInstance/delegatingMeterRouting.test.ts Outdated
Comment thread test/internal/unit/multiInstance/delegatingMeterRouting.test.ts
…g test

Add afterAll cleanup (cm.disable() + context.disable()) so the AsyncLocalStorageContextManager installed in beforeAll does not leak into other tests in the shared Vitest worker. Also document the multi-instance public API in the README.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants